Watch a Metropolis algorithm “get stuck” on the “donut” example. Discuss why this may be occurring.
Excellent reference text: Brooks, S., Gelman, A., Jones, G., & Meng, X. L. (Eds.). (2011). Handbook of markov chain monte carlo. CRC press.
From MacKay, David JC. Information theory, inference and learning algorithms. Cambridge university press, 2003.
“The HMC method is a Metropolis method, applicable to continuous state spaces, that makes use of gradient information to reduce random walk behavior… It seems wasteful to use a simple random-walk Metropolis method when this gradient is available - the gradient indicates which direction one should go in to find states that have higher probability!”
See MacKay, Section 30.1 for a basic implementation of HMC and mathematical details. Let’s return to the MCMC demonstration and I’ll discuss the intuition.
Derived from
Gelman, Andrew, Daniel Lee, and Jiqiang Guo. “Stan: A probabilistic programming language for Bayesian inference and optimization.” Journal of Educational and Behavioral Statistics 40.5 (2015): 530-543.
"Stan was motivated by the desire to solve problems that could not be solved in reasonable time (user programming time plus run time) using other packages.
In comparing Stan to other software options, we consider several criteria: 1. Flexibility, that is, being able to fit the desired model. 2. Ease of use; user programming time. 3. Run time. 4. Scalability as dataset and model grow larger."
rstan for multilevel modeling of eight schools datahttp://mc-stan.org/rstan/articles/rstan.html
Stan and rstan provide fine control for HMC sampling for customized and elaborate models.rstanarm provides a glmer style interface for rapid model development.rethinking package also has a user-friendly interface to specify stan models (see the ulam function below).rstanarm packagerstanarm package.glmer style syntax to do NUTS HMC.arm package and should be a solid choice for your last HW.greta High performance Bayesian inference software## install from github using devtools
## library(devtools); devtools::install_github("rmcelreath/rethinking")
## https://github.com/rmcelreath/rethinking
suppressMessages(library(rethinking))
y <- c(-1, 1)
set.seed(11)
m9.2 <- ulam(
alist(
y ~ dnorm( mu, sigma ),
mu <- alpha,
alpha ~ dnorm(0, 1000) ,
sigma ~ dexp( 0.0001)
),
data = list(y = y), chains = 2)
SAMPLING FOR MODEL '726d002e27cec1633082261fcfedb813' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 8e-06 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.08 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: Iteration: 1 / 1000 [ 0%] (Warmup)
Chain 1: Iteration: 100 / 1000 [ 10%] (Warmup)
Chain 1: Iteration: 200 / 1000 [ 20%] (Warmup)
Chain 1: Iteration: 300 / 1000 [ 30%] (Warmup)
Chain 1: Iteration: 400 / 1000 [ 40%] (Warmup)
Chain 1: Iteration: 500 / 1000 [ 50%] (Warmup)
Chain 1: Iteration: 501 / 1000 [ 50%] (Sampling)
Chain 1: Iteration: 600 / 1000 [ 60%] (Sampling)
Chain 1: Iteration: 700 / 1000 [ 70%] (Sampling)
Chain 1: Iteration: 800 / 1000 [ 80%] (Sampling)
Chain 1: Iteration: 900 / 1000 [ 90%] (Sampling)
Chain 1: Iteration: 1000 / 1000 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 0.041044 seconds (Warm-up)
Chain 1: 0.030275 seconds (Sampling)
Chain 1: 0.071319 seconds (Total)
Chain 1:
SAMPLING FOR MODEL '726d002e27cec1633082261fcfedb813' NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 3e-06 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.03 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2:
Chain 2:
Chain 2: Iteration: 1 / 1000 [ 0%] (Warmup)
Chain 2: Iteration: 100 / 1000 [ 10%] (Warmup)
Chain 2: Iteration: 200 / 1000 [ 20%] (Warmup)
Chain 2: Iteration: 300 / 1000 [ 30%] (Warmup)
Chain 2: Iteration: 400 / 1000 [ 40%] (Warmup)
Chain 2: Iteration: 500 / 1000 [ 50%] (Warmup)
Chain 2: Iteration: 501 / 1000 [ 50%] (Sampling)
Chain 2: Iteration: 600 / 1000 [ 60%] (Sampling)
Chain 2: Iteration: 700 / 1000 [ 70%] (Sampling)
Chain 2: Iteration: 800 / 1000 [ 80%] (Sampling)
Chain 2: Iteration: 900 / 1000 [ 90%] (Sampling)
Chain 2: Iteration: 1000 / 1000 [100%] (Sampling)
Chain 2:
Chain 2: Elapsed Time: 0.041869 seconds (Warm-up)
Chain 2: 0.048629 seconds (Sampling)
Chain 2: 0.090498 seconds (Total)
Chain 2:
mean sd 5.5% 94.5% n_eff Rhat4
alpha -23.55871 373.4569 -676.254615 550.0968 71.69881 1.0206676
sigma 568.94576 1210.7862 8.826338 2309.5151 150.22651 0.9997524
set.seed(11)
m9.3 <- ulam(
alist(
y ~ dnorm( mu, sigma ),
mu <- alpha,
## even include a "bad" starting point
alpha ~ dnorm(1, 10) ,
sigma ~ dexp( 1)
),
data = list(y = y), chains = 2)
SAMPLING FOR MODEL 'db8b93ccfa83872ce482c35ebed2c618' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 1.2e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: Iteration: 1 / 1000 [ 0%] (Warmup)
Chain 1: Iteration: 100 / 1000 [ 10%] (Warmup)
Chain 1: Iteration: 200 / 1000 [ 20%] (Warmup)
Chain 1: Iteration: 300 / 1000 [ 30%] (Warmup)
Chain 1: Iteration: 400 / 1000 [ 40%] (Warmup)
Chain 1: Iteration: 500 / 1000 [ 50%] (Warmup)
Chain 1: Iteration: 501 / 1000 [ 50%] (Sampling)
Chain 1: Iteration: 600 / 1000 [ 60%] (Sampling)
Chain 1: Iteration: 700 / 1000 [ 70%] (Sampling)
Chain 1: Iteration: 800 / 1000 [ 80%] (Sampling)
Chain 1: Iteration: 900 / 1000 [ 90%] (Sampling)
Chain 1: Iteration: 1000 / 1000 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 0.012427 seconds (Warm-up)
Chain 1: 0.009018 seconds (Sampling)
Chain 1: 0.021445 seconds (Total)
Chain 1:
SAMPLING FOR MODEL 'db8b93ccfa83872ce482c35ebed2c618' NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 3e-06 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.03 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2:
Chain 2:
Chain 2: Iteration: 1 / 1000 [ 0%] (Warmup)
Chain 2: Iteration: 100 / 1000 [ 10%] (Warmup)
Chain 2: Iteration: 200 / 1000 [ 20%] (Warmup)
Chain 2: Iteration: 300 / 1000 [ 30%] (Warmup)
Chain 2: Iteration: 400 / 1000 [ 40%] (Warmup)
Chain 2: Iteration: 500 / 1000 [ 50%] (Warmup)
Chain 2: Iteration: 501 / 1000 [ 50%] (Sampling)
Chain 2: Iteration: 600 / 1000 [ 60%] (Sampling)
Chain 2: Iteration: 700 / 1000 [ 70%] (Sampling)
Chain 2: Iteration: 800 / 1000 [ 80%] (Sampling)
Chain 2: Iteration: 900 / 1000 [ 90%] (Sampling)
Chain 2: Iteration: 1000 / 1000 [100%] (Sampling)
Chain 2:
Chain 2: Elapsed Time: 0.01224 seconds (Warm-up)
Chain 2: 0.010001 seconds (Sampling)
Chain 2: 0.022241 seconds (Total)
Chain 2:
mean sd 5.5% 94.5% n_eff Rhat4
alpha 0.08552607 1.1880916 -1.6238445 2.105847 301.2610 1.003915
sigma 1.57960741 0.8229531 0.6827578 3.088258 247.0879 1.005284
Prof Andrew Gelman says to heed it
This is a good workflow to employ:
Gabry et al Visualization in Bayesian workflow
The four basic steps are
In what situations do we need to use more advanced and efficient MC algorithms?